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Abstract 


Multisets are sets that allow repetition of elements, therefore account for their frequency of observation. As such, 
multisets pave the way to a number of interesting possibilities of both theoretical and applied nature. In the present 
work, after revising the main aspects of traditional sets, we introduce some of the main concepts and characteristics 
of multisets, which is followed by their generalization to take into account vectors, matrices, functions, scalar fields, 
and densities. A new approach is also proposed in which the negative multiplicities are allowed, implying the multiset 
universe to become finite and well-defined, corresponding to the multiset with all support element associated with 
null multiplicities. It then becomes possible to define the complement operation in multisets in a robust manner, 





which allows properties involving complement — including the De Morgan theorem — to be recovered in multisets . 
The possibility to define set operation between mfunctions that are analogous to the traditional inner product is also 
proposed, paving the way to obtaining respective mfunction transformations analogous to the Fourier transform. This 
result also allowed the proposal of performing hybrid signal processing operations on mset mfunctions, including filtering 
and template matching. Relationships between the cosine similarity index and the Jaccard index are also identified, 
including the presentation of an intersection-based variation of the cosine index. The potential of multisets in pattern 
recognition and deep learning is also briefly characterized and illustrated, more specifically regarding the frequent issue 


of comparing clusters or densities. 


‘In the bag, seashells gathered long ago resound.’ 


LdaFC 


1 Introduction 


Multisets can be informally understood as sets capable 





of incorporation of repeated entries of the same element 
(e.g. |1, 2, 3, 4, 5, 6]). In a sense, they are at least as com- 





patible with human experience than sets. For instance, 
when one goes to the greengrocer to buy oranges, it does 
matter if our bag contains 3 our 4 elements. 

In the present work, we aim at providing an introduc- 
tion to this interesting area, while also briefly covering the 
Jaccard index adapted to multiset, and applications to 
pattern recognition. Some new results are also described, 
including the possibility to allow negative multiplicities, 
allowing the multiset universe to be composed of null mul- 
tiplicities for all support elements. This allows the com- 
plement operation to become stable, recovering analogous 
properties to all counterparts in traditional sets, including 
the De Morgan theorem. 


We start by reviewing the main concepts and proper- 
ties of traditional sets, and then present the concept of 
multisets, as well as some of their simpler properties, also 
including several examples. The challenges implied by the 
definition of a universe set for multisets is briefly charac- 
terized and discussed. It is also argued that the operations 
of sum and subtraction between multisets correspond to 
one of the main distinction between multiset and set the- 
ories. 

The possibility to generalized multisets to several 
other mathematical structures including vectors, matri- 
ces, functions, scalar and vector fields, as well as prob- 
ability densities are approached next, including several 
examples. In particular, the extension of multisets as rep- 
resentations of functions and scalar fields paves the way 
for obtaining hybrid expressions involving combinations 
of the the set operations of union and intersection with 
algebraic expressions involving sum, subtraction, product 
and division of sets. 

The interesting possibility to obtain an analogue of the 
traditional inner product functional, as well as respec- 
tively based transformations, is addressed next, with the 
introduction of the concepts of mproducts and common 


products, which leads to the understanding of the Jaccard 
index for functions as a normalized version of the common 
product. ‘The relationship between the cosine similarity 
index and the Jaccard index is also addressed from the 
perspective of multisets and multifunctions. 

We then presents how the definition of the common 
product between two msets or mfunctions paves the way 
to obtaining hybrid signal operations including filtering 
and template matching. 

The Jaccard index, as well as its extension to multisets 
and multiple arguments, is then briefly presented as an 
interesting manner to compare any of the mathematical 
structures mentioned above after they have been trans- 
formed into respective multisets. 

The application of multisets and the Jaccard index to 
quantify the relationship between two or more clusters is 
then described with respect to an example related to the 
iris dataset. The measurement of the separation between 
clusters corresponds to an important issue in both pattern 
recognition (e.g. |7, 8|), deep learning (e.g. [9, 10, 11]), and 
modeling (e.g. |12, 13]) 

For simplicity’s sake, the term multisets are henceforth 
abbreviated as msetmset. 


2 ‘Traditional Sets 


A set is an unordered collection of items, or elements, 
which are not allowed torepeat. A set A with elements a, 





b, and c is typically represented as: 


A = {a,b,c} 





The two essential properties of sets therefore are that 
the elements may appear in any order, which distinguish 
sets from vectors, and that the elements cannot be re- 
peated. 

The number of elements in a set is called its cardinality 





or size, being represented as |A]. 

A subset B of a given set A consists of a set so that 
any of its elements are contained in A. If A contains N 
elements, there will be |A| = 2^ possible subsets that can 
be derived from it. The set containing all possible subsets 
of A is called its power set PA. 

An important point about sets that is sometimes over- 
looked regards the fact that they always refer to a re- 
spective universe set Q. More specifically, once this set 





is established, any possible set needs to be a subset of 





Q. Observe that Q can have any type of element, though 





the situation where the elements are homogenous is of 
particular interest. 
In case some sets are given but the universe set is not 





provided, it is still possible to estimate the respective uni- 
verse set (within hypotheses and subject to incomplete- 


ness in case new sets appear) as corresponding to the 
union of the supplied sets. 

The universe set is of fundamental importance because 
the operation of complement of a set is defined with re- 
spect to the universe set. More specifically, the comple- 
ment of a set A consists of all elements of Q that are not 
part of A. The complement of a set is henceforth repre- 
sented as AY, being implicit that the operation refers to 
a given Q. 

Sets can be finite or infinite, as well as discrete or con- 
tinuous. A finite set is any set A so that |A| < œ. A 
discrete set is characterized by having all its elements cor- 
responding to isolated points p, in the sense that each of 
these points possesses a neighborhood which when united 





with the universe set yields only p. Any continuous set is 
infinite, but discrete sets can be finite or infinite. 

An interesting point regards the relationship between 
an element, let’s say ‘a’ and the set {a}. These two math- 
ematical structures are not identical because it is possible 
to include an element into {a}, but not into ‘a’. 





The empty set, represented as ¢ = {} is a subset of any 
possible set. 

Given two sets A and B, their union consists of a third 
set C containing all elements from A and B. The intersect 
of these two sets corresponds to a set C containing all 
elements that are in both A and B. A subset B of A can 
therefore be understood as to be so that AN B = A. Any 
set is a subset of itself. 





The difference between two sets A and B, indicated as 
A — B, corresponds to the set C containing all elements 
that are in A but are not in B. 

Given three sets A, B, and ČC derived from a given Q, 
the following properties are satisfied: 


ALIA =Q (1) 

AAC SA (2) 

AU@=A (3) 

Ango=¢ (4) 

AUA=A (5) 

ANA=A (6) 

AUB=BUA (7) 

ANB=BnA (8) 
AU(BUC)=(AUB)UC (9) 
AN(BUC)=(ANB)UC (10) 
AN(BUC)=(AN B)U(ANC) (11) 
AU(BNC)=(AUB)N(AUC) (12) 
(AU B)© = ATA B® (De Morgan) (13) 
(AN B)© = AT U B® (De Morgan) (14) 


3 Multisets 


Basically, msets are sets allowing the repetition of ele- 
ments, which is understood as their multiplicity or fre- 
quency. As with sets, the order of the elements is imma- 
terial. Examples of mset include: 


A = {la, a, b, b, b, d}; 

B= W242 doo ltl. 2,.2 
CG =41,0,2,0,0,.3,0,¢,¢,1,4,2,a,a) = 
=AL lsd, 253; 000, 0,00. cl 

D = {a,a, b, dh. 


Observe the different symbol adopted henceforth in this 
work in order to emphasize the distinction between a tra- 
ditional set ({}), and a mset ({|}}). 

A more compact representation of a mset A can be 
obtained by using 2-tuple or pairs |a, m(a)|, where ‘a’ is 
an element and m(a) it its multiplicity, i.e. the number of 
times it appear in A. In the case of the above examples, 
we have: 


A = {a,a, b,b, b, d} = {[a, 2]; [b, 2]; [d, 1]} 
B= 1,4, 1,1¢2,2,2)) SALA] 

CO = 41, 1, 2, 2,3, a;0,0,0, b,c,¢,cl = 

= {|[1, 2]; [2, 2]; [3, 1]; [a, 3]; [b, 2]; [e, 3]; 
D = {a,a,b,d} = {la A; b1; 1h- (15) 


? 
) 


Though this type of representation of msets actually 
corresponds to a set, because it is impossible to have two 
identical entries, we shall maintain the ‘{]}’ notation in 
order to emphasize that a mset is being meant. 

When referring to the multiplicity of an element, it is 
important to specify to which mset this is being referred. 
This can be done by writing m,(a), meaning the multi- 
plicity of the element a in the mset A. 

The property analogous to inclusion in sets can be 
stated as follows. A mset A is included in another mset 
B whenever ma(a) < mp(a). For instance, in the case of 
the examples above, we have m,(a) = 2 and mco(a) = 2. 

As with sets, it is particularly important to specify the 
universe of a mset. ‘This can be done in an analogous man- 
ner as with sets. As an example, let’s obtain a possible 
universe set for the two msets A and B above: 


Q = {a,b,d, 1,2} (16) 


It should be kept in mind that this universe is not anal- 
ogous to the counterpart in sets, as it does not actually 
account for the possible multiplicity of the involved ele- 
ments, which is unbound through the sum operation. 

It is now possible to rewrite those two sets in a more 


complete, though redundant manner, as follows: 
P = {[a, 2]; [b, 2]; [d, 1]; [1, 0}; [2, Olf; 
Q = {[a, 0]; [b, O]; [c, 0]; [1, 4]; [2, 3]f; 
The support of a given mset A is defined as: 
S4 = {x|xz E€ Q,m(xz) > 0} (17) 
For instance, the supports of the msets in Equation ?? 


Sa =a,b 

De Sle 

Sa:-=1,2,0.0,¢ 

Sp =a,b,d 
(18) 
As such, this set can be understood as containing all 
distinct elements in A. Observe that the support set pro- 


vides a useful index for identifying the possible elements 
in the respective msets. 


4 Multiset Operations 


A set A is said to be included into another set whenever: 


malz) < m(x), Yx EA (19) 


For simplicity’s sake, we will indicate this operation 
using the same symbol as for sets, i.e. A C B, as the type 
of operation can be inferred from A and B being sets or 
msets. 

In the case of the mset examples above, we can write 
that D C A. 

The union Č of two msets A and B can be defined as: 


C = AUB = {|zr,mco(x)|, xz € A or z € Bh, 


with mo(x) = max {ma,(x), mp(x)} (20) 


Examples considering the msets in the beginning of Sec- 
tion 3 include: 


AUB = {a,a,b, b, b,d,1,1,1, 1,2, 2,2} = 
{[1, 2]; [b, 2]; d, 1]; 1, 4]; [2, 3]} 
AUD = {a,a,b,d} = {[a, 4]; [b, 4]; [d, 2]} 


It is interesting to observe that the resulting multiplic- 
ity of each element does not correspond to the sum of the 
respective multiplicities, but to the maximum between 
them. This is a particularly important point that deserves 
further contemplation, so we will be back to it after pre- 
senting the concept of sum of two msets. 

Let A and B be msets. The sum of these two sets, 
henceforth represented as Č = A + B, is defined as: 





C=A+B={[z,mco(x)|,2 € A or « € Bh, 
with molz) = ma(xz) + mp(a) (21) 


Figure 1 illustrates the two different ways in which the 
common elements of two msets A and B are collected into 
their respective union and sum msets. 


A 





A+B 


Figure 1: The union (a) and sum (b) of two msets A and B typically 


AUB 





yield quite different resulting msets. In the case of the union opera- 
tion, each of the elements of the same type are compared, with the 
elements with the maximum multiplicity being incorporated into 
C. The sum of the two msets incorporates all the m4(x;) +mp(za;) 
elements into C. 


Examples respective to the msets in the beginning of 
Section 3 include: 


A+B = 4a,a,b,b,b,d,1,1,1,1,2,2,2} = 
{[1, 2]; [b, 2]; d, 1]; 1, 4]; [2, 3]} 
A+ D = {a,a,a, a, b, b, b, b,d, d} = {la, 4]; [b, 4]; [d, 21} 


Thus, we have that the mset operations of union and 
sum are related in the sense that both collect the ele- 
ments from the two msets, but the way in which this is 
done is quite different, with the multiplicities of the mset 
obtained by union becoming necessarily smaller or equal 
than that of the mset obtained by sum, i.e. mauB(xi) < 
MA+B.- 

It is interesting to consider these two operations in the 
context of possible respective applications. The sum of 
the two msets ensures conservation of the total number 


of elements (such as in conservative or flow-related prob- 
lems), being therefore more indicated for related situa- 
tions. The union of two msets can be conceptually under- 
stood as a kind of mid point between the sum of msets 
and the conventional union of traditional sets. Though 
the union of msets will typically yield larger msets than 
the union, it will not guarantee conservation of the to- 
tal number of elements. A typical situation in which the 
union of msets can be applied is when the incorporation 
of the elements from the two msets is performed in terms 
of a choice, with the larger set being taken. 

The intersection between two msets A and B can be 


defined as: 


C=ANB={[|z,mc(ax)|,x4 € A or z € Bh, 


with mo(x) = min{ma,(x),mp(a)} (22) 


Examples drawn from the msets in the beginning of 
Section 3 include: 


ANB={} = 
AND = {a,a, bf = {la, 4]; [b, 4); [d, 2]] 
(23) 


The difference or subtraction between two msets is ex- 
pressed as: 


C=A-B={({|z,mc(x)|,2 € A or z € Bh, 
with mo(x) = max {ma(x) — mp(x), 0} (24) 


It is interesting to observe that the restriction of not 
having negative values can be overlooked without great 
impact on the other properties and operations as ad- 
dressed in the present work. Actually, in Section 7, we 
will show that allowing negative multiplicities paves the 
way to a robust definition of the universe mset as well as 
closed subtraction operations. 





Figure 2 illustrates the intersection and difference be- 
tween two msets A and B. 


Respective examples include: 
A-D = {b,b} = {[b,21) 


D-A={) 
(25) 





ANB 


Figure 2: The intersection (a) and difference (b) of two msets A and 


A-B 





B typically yield quite different resulting msets. In the case of the 
intersection operation, each of the elements of the same type are 
compared, with the elements with the minimum respective multi- 





plicity being incorporated into C. The difference between A and B 
depends on m,(x;) — mp(a;). As the result is negative in the case 





of the present example, no elements are incorporated into A — B. 


5 Multisets Properties 


It can be shown that msets satisfy the following proper- 


ties: 
AU@=A 26 
ANo=¢ 2r 
AUA=A 28 
ANA=A 29 
AUB=BUA 30 
ANB=BOA 


AU(BUC)=(AUB)UC 
AN(BUC)=(ANB)UC 
AN(BUC)=(ANB)U(ANC) 
AU(BNC)=(AUB)N(AUC) 
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Informally speaking, the above properties can ben un- 





derstood by identifying each repeated element in an mset 
with unique respective tags, in which case they would 


behave with respect to the operation of union and inter- 
section exactly in the same way as common sets. 

So, we have that msets follow all the properties in Equa- 
tions 1 to Equations 14 , except those involving comple- 
mentation. Indeed, the definition of the complement of 
an mset has been a challenging issue (e.g. [6]), which has 
to do with the fact that there is no bound to the size 
of possible msets generated by making additions between 
any non-empty mset. For instance, we can write: 


A = {a} 

A=A+A= {a,a} 

A=A+A= {a,a,a, al} 
(36) 





Interestingly, as will be presented in Section 7, the uni- 
verse mset actually correspond to an mset with all multi- 
plicities equal to zero. When applied to functions repre- 
sented as msets, this means that the null function is the 
respective universe. 

For this reason, the useful De Morgan properties, as 





well as other related results, do not hold for msets. In 





addition, there are relative few properties involving the 
sum and subtraction of msets. However, when negative 
values are allowed for the subtraction between msets, the 
universe mset consists of taking all the multiplicities as 
zero, and the complement operation becomes the change 
of sign of the multiplicities (see Equation 40). 

In a sense, it is these two operations that differentiates 
msets from sets because, as commented above, msets be- 
have like sets respectively to their union and intersection 
when the elements are tagged. 

Though we gain two new operations when working with 
msets, these operations imply some specific challenges to 
be addressed. In the present work, we will simply assume 
that the complement of msets are not available. 


6 Multisets, Vectors, Matrices 


In this section we will discuss the relationship between 
msets and vectors. First, we recall that the elements in a 





vector are expected to follow a well-determined order as 
indicated by their indices. For instance, in the case of the 
vector in R: 

v = [3, 2.5, 7,0, —1]| 


we have five indices 7 = 1,2,...,5, so that we can spec- 
ify the respective element values as v|1] = 3, v[2] = 2.5, 
v[3] = 7, v[4] = 0, and v[5] = —1. 

By understanding the values of the components of a 
vector as generalized multiplicities, we can immediately 
derive the following mset from the above vector: 


V = [1,3]; [2, 2.5]; [3, z]; [4, 0]; [5, —1]}} 





Therefore, we have that an mset can be derived from 
any vector, but that a vector can be obtained from an 





mset only if their elements are ordered in some manner, 
e.g. by taking their respective values instead of under- 
standing them as labels. ‘This situation becomes more 
evident when one considers non-numeric elements. As 
such, msets can be used to study the elements of vectors 
without taking into account their relative position along 
the vector. 

It is also interesting to contemplate the relationship be- 
tween the above discussion and the traditional sets con- 
taining multiplicities. More specifically, we can write the 
set containing all multiplicities in the vector v above as: 


V = {3,2.5,2,0, -1} = {0,3, 2.5, -1, 7} = ete. 


Though V and V are very similar, they are not identi- 
cal because in V the correspondence of the elements and 
the respective multiplicity is maintained. This difference 
becomes critically important in case we want to apply 
the operations of addition and subtraction between msets, 
which would be otherwise impossible in case of sets be- 
cause we would not know which element of one set should 
be added to which element in the other set. 

By representing vectors as msets, we not only preserve 
the operations of subtraction and difference, but also in- 
corporates the possibilities of defining intersections and 
unions between any two vectors. 

Another interesting possibility is to incorporate new 
operators for multiplication and division into the mset 
framework, which can be done straightforwardly, while 
avoiding divisions by zero. 

Interestingly, it is also possible to obtain mset rep- 
resentations from matrices or even other more sophisti- 
cated mathematical structures as tensors. In the case 
of matrices, this can be done by mapping the indices 
1=1,2,...,N; and 7 = 1,2,...,N,; into a single index 
k, e.g.: 

k 4— Ni(j—1)+i-1 (37) 
so that the matrix becomes a vector, which can then be 


transformed into the respective mset as described above. 
Interestingly, observe that though the obtained msets 





would not directly provide the respective indices of the 





elements, they can be nevertheless recovered from the uni- 


fied index. 


7 Functions and Scalar Fields 


The possibility to represent vectors as msets opens the 
One of 


them is to represent discrete and continuous functions 


way to a number of interesting possibilities. 


and scalar fields (vector fields can be approached as vec- 


tors of scalar fields). We develop these possibilities in the 
following. 

We start with a discrete function as illustrated in Fig- 
ure 3. 


p(x;) 





Figure 3: A generic discrete function f(z;). 


It is often interesting to represent such discrete func- 
tions in terms of sums of Dirac delta functions. 

Provided that we limit the extension of this function 
along the horizontal axis (related to the function support), 





e.g. 1 = 1,2,..., N, it can be immediately represented in 
terms of the vector: 


—> 


f = [f (z:)] (38) 


which can then be transformed into the respective mset 
as described in the previous section. 

A similar approach can be applied to transform dis- 
crete scalar fields defined on more than one variables into 
respective msets, involving the index mapping described 
above. 

Now, we can approach the case of continuous functions 
and scalar fields simply by taking the separation between 
the points along the horizontal axis to the limit of 0. As 
a consequence, the functions and scalar fields will become 
associated to msets with infinite elements, but these can 





still be operated by the ‘max()’, ‘min()’, ‘+’ and ‘— op- 
erations required by the mset operations, as well as any 
other function or transformation applicable to functions. 
For simplicity’s sake, we shall call the multisets derived 
from functions as mfunctions. 

Observe that ,though all information about a function 
is preserved into the respective mfunction, allowing its re- 
construction, the adjacency between any two points along 
the domain is lost in the mfunction because the order of 
the elements in a mset does not matter. 

Let’s illustrate the above concepts and possibilities in 
terms of the two following functions: 

O 
h(a) — 9e710lx—0.1]| (39) 


Figure 5 illustrates the two above functions as well as 


their union, intersection, sum, and subtraction after hav- 
ing been converted into respective msets. 
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Figure 4: Two continuous functions g(x) and h(x) of a single vari- 





able (a), and their respective mset operations (b) of union, intersec- 
tion, sum, and subtraction. 


This examples illustrate several interesting points. 
First, we have the transformation of functions into re- 





spective msets immediately allow them to be operated by 
union and intersection. In addition, we observe an exam- 
ple that the sum of two functions is larger or equal than 
their union, as well as the possibility of the subtraction 
operation yielding negative values. 

Figure 5 shows the two additional operations of product 
and quotient between the two functions g(x) and h(x) 
above, avoiding the divisions by zero. 


It is interesting to observe that the potential of the 
operations of union and intersection in producing sharp 
derivatives and discontinuities, which contribute an inter- 
esting manner of representing an ample range of function 
types as combinations of these operations, not to mention 
the operations of sum, subtraction, product and quotient. 

Consequently, it becomes an interesting possibility to 
develop transformations of functions, analogous to the 


Te) 
— _ g(x)*h(x) 
~ g(x)/h(x) 

= 

Te) 

o 





Figure 5: Two functions g(x) and h(x) of a single variable (a), and 





their respective mset operations (b) of union, intersection, sum, and 
subtraction. 


Fourier transform, considering not only series of basis 
functions, but also intersections and/or unions, and/or 
other possible operations between msets. One possible 
interesting benefit would be to become able to express 
functions with discontinuous derivatives as combinations 
of functions that are completely smooth. Also, it should 
be observed that the operations of sum and subtraction 
are bilinear, while the minimum and maximum are not. 

Indeed, the above developments also allow new func- 
tions to be obtained as combinations of logic operations 
as the union and intersection and numeric operations as 
sum, subtraction, product, and division. For instance, it 
becomes possible to write things such as: 


a 
c 
| 
= 
= 
= 
a 


)+ 9(2) 
— h(z)) 
| — [g(x) UA(z)] 


These three functions are illustrated in Figure 6 assum- 
ing the function in Equation 39. 


-1 
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Figure 6: The functions r(x), s(x) and t(x) obtained through mset 
operations. 


Seen from this perspective, the complement of a multset 
becomes the operation of sign change, in the sense that 
r Indeed, it can be verified 


r(x)“ would become —r(z). 


that the De Morgan properties hold in this case, i.e: 


—|g(z) N A(x)] = —g(@) U h(x) 
—|g(z) U h(x)] = —g(@) N h(x) 


The potential of these hybrid constructions is as large 


as our imagination, because it allows the incorporation of 
much of the concepts, structures, and properties of both 
arithmetic, set theory, and also logic (which is directly 
analogous to set theory). 

It should also be observed that the above results could 
only be achieved from allowing negative values for the 
multiplicity in the difference operation between msets. 





The sign change therefore recovers many properties anal- 
ogous to those involving complements of sets, and more. 
As a consequence, the universe mset becomes as follows: 


Q = {[v1, 0]; [x2, 0];...; [zn 0] (40) 


where x; are the elements of the support of the msets. 





Other properties analogous to those of traditional sets 
are also consequently recovered, including AU AC. 


8 The Multiset Jaccard Indices 





The Jaccard index represents an effective and conceptu- 
ally appealing manner to quantify the similarity between 
any two sets A and B (e.g. [14, 15, 16]), having therefore 
being extensively applied in a vast range of problems in 
several scientific and technological fields. 

In its most basic form, the Jaccard index between an 





two sets A and B can be expressed as: 


— |ANB 


TB) — JAUB 





(41) 
It is possible to adapt the Jaccard index to msets by 

making: 

Xi- main (m(a;), m(b;)) 

D1 max (m(a;), m(b;)) 


where a; and b; are the elements of the sets A and B, 


Im (A, B) = 





respectively, and N is the cardinality of the universe of 
those sets. We also have that 0 < Jy (A, B) < 1. 

As an example, let’s consider A = {a,b,b,c,c,c} and 
B = {a,b,c,c,d}. Then, we have: 


= 1+1+2+0 4 


AB) =- a 4 
D) 2+3+5+1 Il oa) 
It is possible to adapt the Jaccard index to mfunctions 
by making: 
(A,B) = en Mma) m) yyy 


fa max (m4(Z),mp(z 


where © is the common support of the two functions or 
scalar fields. 
As such, the Jaccard index can be understood as a func- 
tional, or mfunctional of the two functions of scalar fields. 
The Jaccard index has been enhanced and extended to 
functions, scalar fields, joint variations and more than 2 
sets [16]. In particular, the latter type of Jaccard index 
for 3 sets A, ,B and C can be written as: 
ANBNC 


F(A, B,C) = AUBUG (45) 


9 Inner Product and Transforms 


The concept of multifunctions as proposed above has some 
interesting implications. In particular, it effectively en- 
dows mfunctions and scalar fields with the analogous of 
the set operations of union and intersection, which im- 
mediately motivates several important questions such as 
what are the properties involving both set and algebraic 
properties of mfunctions. Another possibility of particu- 
lar relevance because of its immediate relationship with 
the concepts of similarity and transformation regards if 





it is possible to define a set-relate operation between two 
mfunctions which would be analogous to the inner prod- 
uct. In this section we propose one such operation and 
then use it to define transforms of mfunctions in a man- 
ner similar to Fourier transform, but without the property 
of orthogonality extending over all distinct sinusoidals. 

We start by recalling the inner product between two 
real mfunctions f(x) and g(x): 


Fose) = f ET (46) 


Any two functions f(x) and g(x) are said to be or- 
thogonal if and only if their inner product is zero. The 
prototypical example of orthogonal functions consists in 
sinusoidals s(xz,w,ġ) = sin(wx + ¢), with w = 2r f. It is 
known that: 


(F(@).9(2)) = | 


— OO 


s(z,w, p)s(£,w, ġ)dr =0 (47) 


provided ¢ Æ ¢. However, when ¢ Æ ¢, we have 
(f(x), g(x))}=1. Therefore sinusoidals are orthogonal. 

Now, the similarity between a generic function f(x) 
and a sinusoidal s(x,w,ġ) can be readily expressed by 
the respective inner product, i.e.: 


Fasl) =| fle)s(x,w.d)dr=0 (48) 
The higher the obtained value, the more similar the 


Actually, the in- 
ner product (f(x), s(z,w,@)) corresponds directly to the 


function f(x) is to the sinusoidal. 


Fourier transform of g(x), which provides an effective 
manner of expressing f(x) as a linear combination of si- 
nusoidals, which are more effectively handled as complex 
exponentials. 

A particularly important property of the Fourier, as 
well as other orthogonal transforms is that, given the or- 
thogonality between the basis functions, the calculation of 
the coefficients can be made independently one another. 

Having obtained the transformation coefficients for ev- 
ery possible sinusoidal, it is then possible to recover the 
original function f(x). 

Now, we develop an analogous construction regarding 
the set operations incorporated into mfunctions. 

First, we start by defining the mfunctional correspond- 
ing to the integral of the intersection between any two 
functions f(x) and g(x). It is proposed here that this can 
be done as follows: 


K f(x), glx) >= | 


— OO 


spsg min(ss f(x), sg, g(2))dx 
(49) 

where sẹ and sg are the signs of f(x) and g(x), and 
plx) = f(x) o glx) = srs, min(ssf,Sg,g) is henceforth 
called the mproduct of f(x) and g(x). In order to distin- 
guish between the traditional and set-based inner prod- 
ucts, we will refer to the latter as common product. 

It is also possible to define other functionals, such as: 


FE) @ gle) = f maxts,f(e),s9,9()ae (50) 


which is analogous to the outer product, being hence- 
forth called the sup product. 
This integral can be verified to effectively correspond 





to the common area between the two functions, with the 
area being taken considering the functions signs as in the 
above expression. 

Thus, the Jaccard index for mfunctions can now be 


expressed as: 
_ < fæ) > 
J (f(x), 9(x)) = f(x) ® g(a) 


As such, the Jaccard index can be understood as cor- 


(51) 


responding to the common product, which quantifies the 
similarity between the two msets or mfunctions in terms 
of the intersection-based common product (a functional), 
normalized relatively to the union of the mfunctions. 

Let’s illustrate this mfunctional respectively to sinu- 
soidals. For simplicity’s sake, we shall consider that all 
functions share the same support [0,7], T = 1f. 

In case f(x) = g(x) = sin(wx), we immediately have 
that their intersection is f Mg = g = g, from which 


K f(x), g(x) >= 


T/2 i 
= / sin(wa)dt + J sin(wa)dt = 1 
0 T/2 


Now, in case f(x) = —g(ax) = sin(wx), we have f(x) A 
g(x) = 0, implying 


K f(x), g(x) >= 


fija T 
= - f sin(wa)dt — J sin(wx)dt = —1 
0 


T/2 


For f(x) = sin(w) and g(x) = cos(w), it follows that: 


K f(x), g(x) >= 


T/4 P2 
= f sin(wa)dt -j sin(wa)dt + 
0 T/4 


3T /4 is 
+f sin(wa)dt — J sin(wax)dt = 0 (52) 
TI2 3T/4 


So far, the common product between the sinusoidals has 
presented a remarkable similarity with the traditional in- 
ner product, being identical for the three situations above. 
In addition, we can conclude that the sine and cosine are 
orthogonal also regarding the common product. 

However, let’s now consider that f(x) = sin(w) and 
g(x) = sin(w + ¢), with @ # 0. Unlike the traditional 
inner product, the common product of these two functions 





will no longer be zero, being nevertheless comprised in 





the interval (0,1). As a consequence, except for the case 
f(x) = sin(w) and g(x) = cos(w), no other combination 
of sinusoidals will result orthogonal one another. 





Nevertheless, it is still possible to define transforma- 
tions capable of decomposing (and recovering) a generic 
mfunction f(x) in terms of a combination of basic mfunc- 
tions such as sinusoidals. ‘This can be readily accom- 
plished as follows. 

Let g;(x), i = 1,2,..., N, be the basis function of the 
transformation, and f(x) be the function to be analyzed. 


The transformation coefficients can be obtained as: 


fori = 12.03 N 
ci =X f(x), g(x) >= 


7 a SfSg, min(s¢ f (£), Sg, g(x) dx; 
f(x) = f(x) — g: (2) 


Each of these coefficients corresponds to the effectively 





shared area between each pair of functions. 

In the case of sinusoidal mfunctions, because they are 
generally not orthogonal, we will have that the above 
coefficients will depend on the order in which the basis 
functions are applied, so that several distinct transforma- 
tions can be derived for a same function and a same basis. 
However, it is possible to employ optimization methods in 
order to achieve specific objectives. 





The original function f(x) can then be recovered to a 
given precision (depending on the number and type of 


basis functions) as: 
f(x) = 0; 
for i =1,2,...,N 
flv) = f(x) + cigi) (53) 


10 The Cosine Similarity from the 
Multiset Perspective 


Given two vectors X and Y with N elements each, their 
cosine similarity is commonly expressed as: 


1 N 
= m) X&Y, 
XIIY] 2 


where 0 is the smallest angle between the two vectors. 


Cs(X,Y) = Cos(0) (54) 


Typically the cosine similarity assumes all the elements 
to be nonnegative. 

Now, we have already seem that vectors and multisets 
are closely related, in the sense that the latter can be 
derived from the former. Consequently, the cosine simi- 
larity index can be immediately understood in terms of 
multisets. 

Let X and B be the multisets derived from X and 
Y, therefore sharing the same support with N elements. 
Then, we can write: 

1 N 
ee X.)mM(Y: 
i 2m }m(%) 


i i=1 


Cs(X,Y) = (55) 

Therefore, the cosine similarity index can be under- 
stood as a normalized version of the Jaccard index where 
the union between the multisets has been replaced by 
their respective product, normalized by the product of 
the number of elements in the two multisets. 

The above reasoning extends directly to two real mfunc- 
tions f(x) and g(x): 


Cs( f(x), 9(x)) 


1 
~ Fo if@de fy Fede J, O02 


Now, it becomes possible to adapt the cosine similarity 
to intersection-based similarity by adopting the mproduct 
proposed in Section 9, i.e.: 


Cs( f(z), 9(x)) 


1 
fa lf(a)lde fa |f(x)|dx 


11 ‘Toward Hybrid Signal Process- 
ing 


The definition of the set common product as an analogue 
counterpart of the inner product binary operation be- 


[sso min(ss flo), 80, 9(e))a 
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tween two mfunctions paved the way to the development 
of a field that we could denominate set signal processing, 
namely the handling and modification of signals by taking 
into account set operations including union and intersec- 
tion together with all other algebraic operations such as 
sum, subtraction, product, division, etc. of mfunctions. 
The resulting set of concepts and methods can then con- 
stitute what is henceforth called hybrid signal processing. 
This interesting subject is developed in the current sec- 
tion. 

We have already seen that the set common product 
plays a role analogous to the inner product. This enables 
us to define the mconvolution, namely the convolution be- 
tween two multifunctions (also extended to other msets) 
as: 


CO 


K f(x)glx — y) > de 


EE / 


— OO 


(56) 


with y € R. Observe that, as with the traditional func- 
tion convolution, the mconvolution is commutative. 

It can be readily verified that the mconvolution quanti- 
fies the intersections areas between the functions, suitably 
adapted to take into account negative multiplicities. 

Because the mconvolution does not penalize the differ- 





ence between the two functions at a given position y with 
respect to the areas that are not overlapped, it becomes 
interesting to define a set similarity mconvolution, hence- 
forth referred to as sconv, as follows: 


foD f SA + > 


which, interestingly, is precisely the same as: 


dx (57) 


FDD f IEO) (58) 





Observe that, since we are using x — y as argument 
of g() in the two above equations, the respective binary 
operations are more related to the correlation between 
two functions than to the convolution. 

Thus, in case a more strict match is desired, the sim- 
ilarity convolution should be used instead of the mcon- 
volution. Observe that the above approach extends im- 
mediately to scalar fields of any dimension as well as to 
generalizations of the Jaccard index as those described 
in [16]. 

The definition of these mconvolutions paves the way 
to theoretical and applied developments in all areas that 
already employ the traditional function convolution, in- 
cluding filtering, control theory, signal and image anal- 
ysis, template matching, to mention but a few possibili- 
ties. This potential is illustrated with respect to template 
matching in the following. 


The operation of template matching consists of, given 
a function f(x), to quantify, along X, the similarity be- 





tween its portions and another reference template func- 
tion g(x). This can be immediately implemented by by 
applying the mconvolution or similarity mconvolution to 
those two functions. High resulting values indicate por- 
tions of f(x) that are similar to g(x). 

Figure 7 presents the result of matching the template 
in (b) with the function in (a) by using mconvolution (c) 
and similarity mconvolution. 


(a) (b) 


f(x) 
0.0 1.0 
g(x) 

0.8 


0.4 


0.0 








mconv(x) 
0 5 10 
sconv(x) 

0.2 


0.0 


Figure 7: Template matching through mconvolution and sconvolu- 
tion. The templated in (b) is to be compared with the mfunction 
in (a). This can ben effectively achieved by using the mconvolu- 








tion between these two functions, whose result is shown in (c). The 
sconvolution (actually the Jaccard index for continuous functions) 
can also be employed in case a more strict quantification of the local 
similarity is being sought (d). 


12 Generalized Multisets and 


Logic 





Set operations are naturally associated to Boolean expres- 
sions. For instance, given three sets A, B, and C, con- 
sider: 


Dataset Domain <> Model Domain 
A= Bo <= m4 = -mpg 
A= BUC 4> may =mgVYme 
A= BAC Sm, =mg Ame 
A=B- C= AN B 4 ma = mg amc 


where mA, Mp and mç are models associated to the 
datasets A, B, and C. 

It is this intrinsic association, so natural to humans to 
the point of sometimes not being realized, that often pro- 
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vides the basis for developing new models (logical com- 





binations of models) while considering the adherence be- 





tween respective datasets. 

The generalization of the concept of multisets to several 
mathematical structures, as well as the identification of 
their relationship with several types of similarity indices, 
paves the way to applying multisets and mfunctions ef- 
fectively in modeling activities. 

As sets are naturally expanded to virtually every other 
mathematical structure, it becomes a topic of particular 
interest to identify which are the logical constructs corre- 
sponding to multiset operations such as sum, subtraction, 
product and division. Probably, these logic concepts have 
to do with the incorporation of the multiplicities implied 
by multisets into the logic reasoning, suggesting a logic 
with weighted or graded Boolean variables. For instance, 
when we say “two apples united with three pears” (multi- 
set union), which has the direct logic meaning of “apples 
and pears”, when transformed to “two apples plus three 
pears” (multiset sum) would mean “two apples and three 
pears”. 

The three levels of purported relationships between two 
sets, propositions, or functions can be summarized as in 
Table 1. 


sa [= [ATV] 
[set theoretical | J? [Au | 





[algebraic — [+ [+ | 


Table 1: The purported interrelationships between three basic op- 


erations at the logic, set, and algebraic levels. . 


13 Multisets in Pattern Recogni- 
tion 


The possibility to use msets to represent any type of den- 
sity paves the way to interesting applications in pattern 
recognition and deep learning (e.g. [7, 8, 9, 10, 11]). In 
this section we illustrate how msets and the Jaccard index 


can be readily applied in order to quantify the similarity 





between two (or more) clusters represented by respective 
density functions. 

Let’s consider the three sets of points in the scatterplot 
shown in Figure 8, which corresponds to the three species 
of iris flower in the frequently adopted iris dataset. Only 
two out of their 4 features have been chosen in the follow- 
ing example for simplicity’s sake. 


The density obtained from the respective discrete sam- 
ples through gaussian kernel expansion are shown in Fig- 
ure 9. 


—— setosa 
—— versicolor 
— virginica 





2.0 2.5 


Figure 8: A scatterplot representing the distribution of three types 








of iris flowers represented by two respective features x and y. 


(a) (b) (c) 


Figure 9: The three density scalar fields obtained by gaussian kernel 





expansion of each of the three categories. 


The obtained multiset Jaccard index for each pairwise 





combination of categories are presented in ‘Table 2. 


| setosa | versicolor | virginica | 
setosa } 1 2.6€-95 


o) 
[versicolor |265| 1 os 





[ima [0 | ous [1 _| 


Table 2: The Jaccard indices obtained for pairwise combinations 
between the three iris species. ‘The resolution has been limited to 6 
digits. 


The obtained results are fully compatible with the in- 
terrelationships between the three densities, or clusters, in 
Figure 8. In addition, the threewise Jaccard index from 
Equation 45 result nearly null, indicating a really small 
chance that the three densities correspond to the same 
cluster. 


14 Concluding Remarks 


The fascinating subject of multisets has been presented 
in a hopefully introductory manner. 

We started with a brief review of traditional sets and 
their properties, which was followed by a progressive pre- 
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sentation of multisets and their characteristics. The pos- 
sibility of obtaining multiset generalizations capable of 
dealing with functions, scalar fields, and densities, was 
then described and illustrated. 

In addition to introducing several of the basic mset con- 
cepts, the present work also proposed how the universe 
mset can be defined in a robust manner by allowing the 
subtraction of msets to take negative values. This paved 
the way for recovering several properties analogous to tra- 
ditional sets involving the complement operation, includ- 
ing the De Morgan theorem. 

The extension of msets to mfunctions, namely tradi- 





tional real functions represented as msets, was also pro- 
posed, paving the way to defining mfunctionals, of which 
the Jaccard index for mfunctions is one example, includ- 
ing operation analogous to the traditional inner prod- 
uct, but involving set operations, which paved the way 
to defining respective transformations analogous to the 
Fourier transform, though devoid of orthogonality. 

Having defined an operation analogous to the inner 
product immediately enabled us to propose binary op- 
erators that are mset and mfunction counterparts of the 
traditional convolution between two functions, paving the 
way to achieving a field that can be called hybrid signa 
processing that is characterized by the incorporation of 
mset and mfunction counterparts to most of the concepts 
and operators adopted in signal processing. 

The extension of the Jaccard index, which is intrin- 
sically related to set theory, to msets and also to allow 
the consideration of more than two sets were also pre- 
sented, which paves the way to employing these combined 
concepts for the characterization of relationships between 
clusters in feature spaces, a problem that is common to 
both the areas of pattern recognition and deep learning. 





The presented concepts and methods can lead to several 
interesting applications, also motivating further integra- 





tions between the structures and properties between the 
domains of set theory, propositional logic, and analysis. 
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